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Abstract 

In a network, we define shell I as the set of nodes at distance t with respect to a given node and 
define rg as the fraction of nodes outside shell i. In a transport process, information or disease 
usually diffuses from a random node and reach nodes shell after shell. Thus, understanding the shell 
structure is crucial for the study of the transport property of networks. We study the statistical 
properties of the shells from a randomly chosen node. For a randomly connected network with 
given degree distribution, we derive analytically the degree distribution and average degree of the 
nodes residing outside shell i as a function of rg. Further, we find that rg follows an iterative 
functional form rg = 0(r^_i), where <j) is expressed in terms of the generating function of the 
original degree distribution of the network. Our results can explain the power-law distribution of 
the number of nodes Bg found in shells with t larger than the network diameter d, which is the 
average distance between all pairs of nodes. For real world networks the theoretical prediction of 
rg deviates from the empirical rg. We introduce a network correlation function c(rg) = rg + \/<j){rg) 
to characterize the correlations in the network, where rg + \ is the empirical value and (f>(rg) is 
the theoretical prediction. c(rg) = 1 indicates perfect agreement between empirical results and 
theory. We apply c(rg) to several model and real world networks. We find that the networks fall 
into two distinct classes: (i) a class of poorly- connected networks with c(rg) > 1, which have larger 
average distances compared with randomly connected networks with the same degree distributions; 
and (ii) a class of well- connected networks with c(rg) < 1. Examples of poorly-connected networks 
include the Watts-Strogatz model and networks characterizing human collaborations, which include 
two citation networks and the actor collaboration network. Examples of well-connected networks 
include the Barabasi- Albert model and the Autonomous System (AS) Internet network. 
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I. INTRODUCTION AND RECENT WORK 



Many complex systems can be described by networks in which the nodes are the elements 
of the system and the links characterize the interactions between the elements. One of the 
most common ways to characterize a network is to determine its degree distribution. A 
classical example of a network is the Erdos-Renyi (ER) [l, 2] model, in which the links are 
randomly assigned to randomly selected pairs of nodes. The degree distribution of the ER 
model is characterized by a Poisson distribution 

P(k) = exp(-(k))(k) k /k\ } (1) 

where (k) is the average degree of the network. Another simple model is a random regular 
(RR) graph in which each node has exactly (k) = ip links, thus P(k) = 5(k — ip). The 
Watts-Strogatz model (WS) [3] is also well-studied, where a random fraction (3 of links from 
a regular lattice with (k) = ip are rewired and connect any pair of nodes. Changing j3 from 
to 1, the WS network interpolates between a regular lattice and an ER graph. In the 
last decade, it has been realized that many social, computer, and biological networks can be 
approximated by scale-free (SF) models with a broad degree distribution characterized by a 
power law 

P{k) ~ k~\ (2) 

with a lower and upper cutoff, k m i n and fc max 0, [f], Q, [si, El . A paradigmatic model that 
explains the abundance of SF networks in nature is the preferential attachment model of 
Barabasi and Albert (BA) 4]. 

The degree distribution is not sufficient to characterize the topology of a network. Given 
a degree distribution, a network can have very different properties such as clustering and 
degree-degree correlation. For example, the network of movie actors [4| in which two actors 
are linked if they play in the same movie, although characterized by a power-law degree 
distribution, has higher clustering coefficient compared to the SF network generated by 
Molloy-Reed algorithm nj with the same degree distribution. 

Besides the degree distribution and clustering coefficient, a network is also characterized 
by the average distance between all pairs of nodes, which we refer to as the network diameter 

n 

d. Random networks with a given degree distribution can be "small worlds" [2j 

d~\nN (3) 



or "ultra-small worlds" 

d~lnlniV. (4) 

The diameter d depends sensitively on the network topology. 

Another important characteristic of a network is the structure of its shells, where shell 



is defined as the set of nodes that are at distance £ from a randomly chosen root node 11]. 



or understanding the transport properties 
where the virus spread from a randomly 



The shell structure of a network is important 
of the network such as the epidemic spread [12 ] 
chosen root and reach nodes shell after shell. The structure of the shells is related to both 
the degree distribution and the network diameter. The shell structure of SF networks has 



been recently studied Ref. [llj, which have introduces a new term "network tomography" 
referring to various properties of shells such as the number of nodes and open links in shell 
£, the degree distribution, and the average degree of the nodes in the exterior of shell t. 



Many real and model networks have fractal properties while others are not 13|. Recently 



Ref. [14| reported a power law distribution of number of nodes Bi in shell £ > d from a 
randomly chosen root. They found that a large class of models and real networks although 
not fractals on all scales exhibit fractal properties in boundary shells with £ > d. Here we 
will develop a theory to explain these findings. 

II. GOALS OF THIS WORK 

In this paper, we extend the study of network tomography describing the shell structure 
in a randomly connected network with an arbitrary degree distribution using generating 



functions. Following Ref. 
than £ as 



we denote the fraction of nodes at distance equal to or larger 

n = i - ^ £ B m , (5) 

and the nodes at distances equal or larger than £ as the exterior E% of shell i. Similarly, we 
define the "r— exterior", E r , as the rN nodes with the largest distances from a given root 
node. To this end, we list all the nodes in ascending order of their distances from the root 
node. In this list, the nodes with the same distance are positioned at random. The last 
rN nodes in this list which have the largest distance to the root are called the E r . Notice 
that E r = Ei if r = t> Introducing r as a continuous variable is a new step compared to 
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Ref. 



Ill ], which allows us to apply the apparatus of generating functions to study network 



tomography. 



The behavior of Bi for £ < d can be approximated by a branching process 15|. In shells 
with I > d, the network will show different topological characteristics compared to shells 
with £ < d. This is due to the high probability to find high degree nodes ("hubs") in shells 
with £ < d, so there is a depletion of high degree nodes in the degree distribution in Ei 
with £ > d. Indeed, the average degree of the nodes in shells with £ < d is greater than the 
average degree in the shells with £ > d fll. l^j. 

Here, we develop a theory to explain the behavior of the degree distribution P r (k) in E r 
and the behavior of the average degree (k(r)) as a function of r in a randomly connected 
network with a given degree distribution. Further, we derive analytically r^ +1 as a function 
of n, r? + i = 0(r^), where <f> can be expressed in terms of generating functions [3] of the 
degree distribution of the network. Using these derived analytical expressions, we explain 
the power law distribution P(Bi) ~ Bj 2 for £ ^> d found in [14J]. Further, based on our 
approach, we introduce the network correlation function c[rt) = r^xj^iri) to characterize 
the correlations in the network. We apply this measure to several model and real-world 
networks. We find that the networks fall into two distinct classes: a class of poorly-connected 
networks with c(r^) > 1, where the virus spreads from a given root slower than in randomly 
connected networks with the same degree distribution; a class of well-connected networks 
with c(r>) > 1, in which the virus spreads faster than in a randomly connected network. 

In this paper we study RR, ER, SF, WS and BA models, as well as several real networks 
including the Actor collaboration network (Actor) [4J, High Energy Physics citations network 
(HEP) 16], the Supreme Court Citation network (SCC) [17] and Autonomous System (AS) 
Internet network (DIMES) |l8j]. As we will show later, WS, Actor, HEP, and SCC belong 
to the class of poorly-connected networks (c(r^) > 1), while BA model and DIMES network 
belong to the class of well-connected networks (c(r^) < 1). 

The paper is organized as follows. In Sec. Ill, we derive analytically the degree distri- 
bution and average degree of nodes in E r and test our theory on ER and SF networks. In 
Sec. IV, we derive analytically a deterministic iterative functional form for r^. In Sec. V, 
we apply our theory to explain the distribution and average value of the number of nodes 
in shells. In Sec. VI, we introduce the network correlation function and apply it to different 
networks. Finally, we present summary in Sec. VII. 



5 



III. DEGREE DISTRIBUTION OF NODES IN r-EXTERIOR E r 



A. Generating function for P(k) 



The generating function of a given degree distribution P(k) is denned as 15|, [l9|, |20|, l21| . 



x 



(6) 



fc=0 



It follows from Eq.(jS]) that the average degree of the network (k) = G' (l). Following a 
randomly chosen link, the probability of reaching a node with k outgoing links (the degree 
of the node is k + 1) is 



Notice that 



and 



P(k) = (k + l)P(k + I)/ YX k + l ) P ( k + !)]• 

k=0 



J2{k + l)x k P{k + 1) = £ kx k - l P{k) = G' (x) 



(7) 



fc=0 



fc=i 



ST(k + l)P(k + l) = G' (l) = (k), 



k=0 



where (k) is the average degree of the network. The generating function for the distribution 
of outgoing links P{k) is 



G 1 (x) = Y,P(k)x k = G' (x)/(k). 



(8) 



fc=0 



The average number of outgoing links, also called the branching factor of the network, is 

k = £ kP(k) = G[(l) = GS(1)/C (1) = t k ^ k + l ]^ k + l) = i^M, (9) 



fc=0 



fc=0 



(k) 



(k) 



For ER networks, Go(x) and G\(x) have the same simple form [151 ]. 

G (x) = Gx{x) = e^ x ~ l \ (10) 



and k = (k). 



B. Branching process 



For a randomly connected network, loops can be neg_ 
network can be approximated by a branching process 15 



ected and the construction of a 
In such a process, 



19. 



20 



21| 
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an outgoing link, no matter at which shell I from the root node it starts, has the same 
probability P(k) to reach a node with k outgoing links in shell i + 1. This assumption is 
very good when I is small and the preferential selection of the nodes with large degree (hubs) 
in shell I does not significantly deplete the probability of finding high degree nodes in the 
further out shells. However, for I > d, t he p robability of finding hubs decreases significantly, 
and so does the average degree (k) [111. Another limitation of the branching process as 
a model of a network is that it approximates a network as a tree without loops, while in a 
network loops are likely to form for i > d. In order to find an approach that works well for 
all values of £, we follow Ref. [11] and introduce a modified branching process that takes 
into account the depletion of large degree nodes and the formation of loops. 

At the beginning of the process, we have N separate nodes, and each node has k open 
links, where k is a random variable with a distribution P(k). We start to build the network 
from a randomly selected node (root). At each time step, we randomly select an open link 
from shell i of the aggregate (root and all nodes already connected to the root) and connect 
this open link to another open link. There are three possible ways to select another open 
link (see Fig. [1]), which can belong to 

(i) a free node not yet connecting to the aggregate, 

(ii) a node in shell t + 1, 

(iii) a node in shell I. 

When all the open links from shell I are connected, we will then select an open link from 
shell i + 1. By doing this, the aggregate keeps growing shell after shell until all open links 
are connected. In cases (ii) and (iii), there are chances to create parallel links (two links 
connecting a pair of nodes) and circular links (one link with two ends connected to the same 
node). For a large network with a finite branching factor k, such events occur with negligible 
probability. 



We denote by r = r(t) 22] the fraction of distant nodes not connected to the aggregate 
at step t. These nodes constitute the r-exterior E r . At the beginning of the growth process, 
before we start to build the first shell, r(0) = (N — 1)/N ~ 1. At the end of the growth 
process, r(t) = r^, where is the fraction of nodes that are not connected to the aggregate 
when the building process is finished, i. e., when all open links in the aggregate are used. 
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The process described above simulates a randomly connected network, which is a good 
approximation for many model and real-world networks. 



C. Degree distribution and average degree of nodes in the r-exterior E r 

Let A r (k) be the number of nodes with degree k in the r-exterior E r at time t. The 



probability to have a node with degree k in E r is given by 



23] 



PrW = (ID 

When we connect an open link from the aggregate to a free node (case (i)), A r (k) changes 
as 

= *<*)- (12) 

where (k(r)) = ^P r (k)k is the average degree of nodes in E r . In the limit of N — > oo, 
Eq. (|12l) can be presented as the derivative of A r (k) with respect to r 

^«^«-^*«l = "|gf d3) 

Differentiating Eq.(fTTT) with respect to r, and using Eq.( fT3|) . we obtain 



dr nW (Jfc(r))' 1 J 

which is rigorous for iV — > oo. Substituting 

/ = Go\r) (15) 
in Eq. f|T4l) . we find by direct differentiation that 

and 

W» - (IT) 
is the solution satisfying Eq. f|T4l) . Notice that Pi (fc) = P(k). 

Eq. (|T6j) and Eq. (|T7|) are respectively the degree distribution and the average degree in 
i? r , as functions of /. Once we know the explicit functional form for Gq(x), we can invert 
Gq{x) to find / = G _1 (r) and find analytically both P r (k) and (k(r)): 

Pr{k)=P{k)&^ } (18) 
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m) = garjg^W) (19) 



In a network with minimum degree k min > 2, we find by Taylor expansion that 

-P(^min + 1) 
-P(^min) ^ 



(Mr)) = fc min + ^^i ^ + 0(r 2 *), (20) 



where a = l/fcmin- 

For ER networks, using Eq. ffTUl) and Eq. (IT7j) . we find 



(k(r)) = lnr + (k). (21) 
For < r < 1, Eq. f[T6l) can be rewritten as 

f,( t)=fW (h,;( " + "'^-'#, (22, 

which implies that the degree distribution in the distant nodes remains a Poisson distribution 
but with a smaller average degree (k(r)). 

Next, we test our theory numerically for ER networks with N = 10 6 nodes and different 
values of (k). To obtain P r (k), we start from a randomly chosen root node, and find the 
nodes in E r and their degree distribution P r (k). This process is repeated many times for 
different roots and different realizations. The results are shown in Fig. [2^. The symbols are 
the simulation results of the degree distribution in E r for r = 1, 0.5 and 0.05. The analytical 
results (full lines) are computed using Eq. ( 122|) . As can be seen, the theory agrees very well 
with the simulation results for both r =0.5 and 0.05. We compared our theory with the 
simulations also for other values of r and (k) and the agreement is also excellent. 



For SF networks, Gq{x) and G\(x) cannot be expressed as elementary functions 15]. 
But for a given P(k), they can be written as power series of x and one can compute the 
expressions in Eq. (1161) and Eq. f[T7|) numerically. In order to reduce the systematic errors 
caused by estimating P(k), we write Gq(x) and G\{x) based on the P(k) obtained from the 
simulation results instead of using its theoretical form. 

I — I 

We built SF networks using the Molloy-Reed algorithm [lOj. In Fig. [2b, the symbols 
represent the simulation results for P r (k) obtained for E r of SF network with A = 3.5 and 
r = 1, 0.5 and 0.1. The lines are the numerical results calculated from Eq.( fT6j) . Good 
agreement between the simulation results and the theoretical predictions can be seen in 
Fig. [2b. Other values of r and A have also been tested with good agreement. 



In Fig. [3^, we show the average degree {k(r)) in E r as a function of r for ER networks 
with different values of (k). Lines representing Eq. f[2Tj) agree very well with the numerical 
results (symbols) even for very small r. We note that Fig. [3h. shows different value of lower 
limit cutoff for r, when (k(r)) is very small. As mentioned before, is the fraction 
of nodes which are not connected to the aggregate at the end of the process. In the next 
section, we will present an equation for r^. 

In Fig. [3b, we present the numerical results of Eq. ffTTl) for SF networks with different 
values of A. For a given E r , (k(r)) is computed from the simulated network and the results 
are averaged over many realizations. Good agreement between the theory (lines) and the 
simulation results (symbols) can be seen. 



IV. ITERATIVE FUNCTIONAL FORM OF r e , THE FRACTION OF NODES 
OUTSIDE SHELL £ 

In this section, we study the growth of the aggregate itself. Let L(t) be the number of 
open links belonging to the full aggregate at step t, and A(t) = L(t)/N. The number of open 
links belonging to shell £ of the aggregate is defined as Lg(t) and Ae(t) = Lg(t)/N. After we 
finish building shell i and just before we start to build shell i + 1, all the open links in the 
aggregate belong to nodes in shell £, so t — ti, we have A^(t) = A(t) [24 ] . In the process of 
building shell £ + 1, Ai(t) decreases to 0. 

Next we show that both A(t) and A^(£) can be expressed as functions of r. In analogy 
with Eq.fjHJ), we define the branching factor of nodes in the r-exterior E r as 

Using Eq. (l23|) and Eq. ljTTj) . k(r) can be rewritten as a function of / as 

HI) - fjf • (24) 
Appendix IA1 shows that A(r) and A^(r) obey differential equations 

dA ^ r ) _ i , A(r) , Mr) / 9fi x 
dr ~ ^ fc «> + r(fc(r)) ^) 
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Eg. (125]) and Eq. (!26|) govern the growth of the aggregate. To solve them, we make the 
same substitution / = G , Q 1 (r) (Eq.[I5]) as before. The general form of the solution for 
Eq.(j251) is 

A(f) = -G' (f)f + C 1 f, (27) 

where C\ is a constant. At time t=0, r — f — 1, and A(l) = 0. With this initial condition, 
we obtain C\ = G (l) = (k). Using Eq. (f2"7j) . the general solution of Eq. (l2l)|) is 

A f Xf)=G' (l)f + C 2 f } (28) 

where C 2 is a constant. When r = re, the building of shell £ is finished. At that time, 
all the open links of the aggregate belong to shell £, A(r) \ r =r t = Ae(r) \ r =r r If we denote 
fi = G , Q 1 (r^), C 2 = —G' (fe). Thus, the solutions of the differential equations Eqs. ff25|) and 
pg]) are 

A(/) =^(l)/ 2 -^(/)/ (29) 
A<(/) = ^(l)/ 2 - G' (/,)/ (30) 

When all open links in the aggregate are used, A = 0, the corresponding f = foe, gives 
the fraction of nodes = Go(f oc ) which do not belong to the aggregate when the building 
process is finished. The value of /oo must satisfy Eq. (!29l with A(f oc ) = 

/oo = G , o(/«»)/G'o(l) = Gi(/ 00 ) ) (31) 

and from Eq. (fI31) 

Too = G (/oo). (32) 

Eqs. (|3~T]) and fl32l) imply that there exist a certain fraction of distant links and nodes not 
connected to the ag greg ate when the building process is finished. These results are consistent 
with previous work [2l|]. The numerical solution for Eq. fl3~T|) is discussed in Sec. IV (A) and 
Appendix B. 

When Ae(f) = 0, the construction of shell £ + 1 is completed, r = r^+i and / = ft+%. 
Then from Eq. fl30|) . we obtain 

f e+1 = G' (f e )/G' (l) = Gtfi), (33) 
which leads to a deterministic iterative functional form for 

n+i = G {f e+1 ) = G (G 1 (G 1 (r e )) = 0(r £ ). (34) 
11 



Eq. ([Mil allows us to make a deterministic prediction of ri+i once we know r^_i. 

This result is different from a similar well-known result [19|] based on the physical meaning 
of the generating function Go(r), which gives a fraction of nodes in the set B not directly 
connected to a randomly selected fraction 1 — r of set A. The difference with Eq. (134)1 is that 
set A is selected not by constructing shells around a root but randomly. Moreover, set B 
may even overlap with set A. 

To test our theory, we use RR networks, where P(k) = 8(k — ip), Go(x) = and 
Gi(x) = x^ -1 , then Eq. (l34|) reduces to 

ri+i = rf' 1 , (35) 

which is shown as lines in Fig. 0^. The symbols in Fig. are the simulation results for RR 
networks with different values of ip. To obtain the simulation results, at each realization 
a random root is chosen and a full set of rg is computed. The results obtained for many 
realizations are plotted as a scatter plot. Due to the homogeneity of RR network, m can 
only take on discrete values. The agreement between the simulation results and Eq. (l35]) is 



25(]. 



excellent, and the scattering almost cannot be observed 
For ER networks, Eq. (l34|) yields 

r m =e< fc ^-D, (36) 

which is valid for all I > 1. We test Eq. (l36|) for ER network with different values of (k) and 
the results are shown in Fig. Hb. The agreement between the theoretical predictions (lines) 
and the the simulation results is excellent. 

For SF networks, Eq. (I34p can be solved numerically using the values of P{k) from the 
generated SF network. The lines shown in Fig. Ht represent the numerical solutions of the 
theory [Eq. (l34"j) ]. The symbols are the simulation results for the generated SF networks. For 
A > 2.5, a good agreement between theory and simulation results can be seen. Note that for 
the very small value of A = 2.2, the simulation results deviate slightly from the theory due 
to high probability of creating parallel and circular links (PCL) in the hubs of the randomly 
connected network |26j (created by case (ii) and (iii) in Fig. [[]). We test Eq. (l3"4l for a SF 
network of A = 2.2 allowing PCL during its construction. The results are shown in Fig. HJi 
as a log-linear plot. The agreement between the theory and the simulation results for a SF 
network with A = 2.2 in presence of PCL is very good. This shows that SF networks built 
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by the Molloy-Reed algorithm without PCL deviate from randomly connected networks for 
very small values of A. We will further discuss this deviation in Sec. V B. 



V. APPLICATIONS 



A. Derivation of the power-law distribution of for £ ^> d 



Recently, a bro ad p ower-law distribution of the number of nodes at shell I (£ 3> d), B~£, 
has been reported [14| . This power-law distribution exists in many model and real networks 
and is characterized by a universal form P(Bi) ~ Bj 2 (see Fig. [5]). Using Eq. (l34|) . we will 
prove this relation and explain the origin of this universal power-law distribution. 

For the purpose of clarity, we use m instead of I for shells with i < d, and n instead of I 
for I > d. For the entire range of shell indices, i will be used. 

For infinitely large networks, we can neglect loops for I < d and approximate the forming 



of a network branching process 



3, q, y , 



21| . It has been reported Ha, |20[] that for 



shell m (with m <C d), the generating function for the number of nodes, B m , in the shell m 



is 



G m {x) = G (G 1 (...(G 1 (x)))) = G (G?- L (x)), 



(37) 



where Gi(G\(...)) = G™ (x) is the result of applying G\(x) } m — 1 times and P(B m ) is the 



coefficient of x Bm in the 
nodes in shell m is k 



aylor expansion of G m (x) around x = 0. The average number of 



151 ]. It is possible to show that G™(x) converges to a function of the 



form $((1 — x)k m ) for large m 20], where $(x) satisfies the Poincare functional relation 



(38) 



where y = 1 — x. The functional form of $(?/) can be uniquely determined from Eq. (1381) . 

has an asymptotic functional form, $(?/) = foo + a V~ S + o(y~ 5 ) 
201 ] . Expanding both sides of Eq.( |38l) . we obtain 



It is known that $ 
where a is a constant 



Gi(/oo) + G'^Uay"' = f OQ + ak- s y- s + o(y 



(39) 



Since G^f^) = f^, we find 



lnGit/ooVlnfc. 



(40) 
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The numerical solution of Gi(/oo) = foo depends on different scenarios (see Appendix [B]) as 

f J > 0, for P(k = 1) + 0; 
[ = 0, for P(fc = 1) = 0. 

The solution for 5 is (see Appendix [B]) 

J > 0, for P(k = 1) ^ and P(jfe = 2) ^ 0; 
| = oo, for P(k = 1) = and P(Jfe = 2) = 0. 

Applying Tauberian-like theorems 20|, |27| to $(?/), which has a power- law behavior for 
y — >■ oo, the Taylor expansion coefficient of G m (x), it has been found [27| that P(B m ) 



behaves as P^ with an exponential cutoff at P^ ~ k m and some quasi-periodic modulations 
with period 1 as a function of log^ B m 0, Q , where 

6-1, for P(fc = 1) ^ ; 
25 - 1, for P(jfe = 1) = and P(k = 2) ^ ; 
oo, for P(jfe = 1) = and P(k = 2) = . 

Thus, the probability distribution of the number of nodes in the shell m has a power law 
tail for small values of B m [14( , 

P(B m ) ~ P^, (42) 

if P(fc = 1) +P(A; = 2) > 0. 

The above considerations are correct only for m d, where the depletion of nodes with 
large degree is insignificant. For t > d, we must consider the changing of P r (k). 

Using Eq. (134p for the whole range of £, we can write the relation between r n for n > d 
and r m for m <C d as 

r n = Go(G 1 (G 1 (Go(G 1 (G'o 1 ...(r m )...) = G Q {GT m {GZ \r m ))) = G Q (Gr m (f m )). (43) 
Applying the same considerations as for B m , we obtain, 

Gl- m Um) = foo + ak- 5 ^ m \l - f m y 5 . 



(44) 



Using 

l-f m = l- G Q \r m ) = 1 - Go '(1 - (1 - r m )), 
we can write a Taylor expansion for z = 1 — r m as 



(45) 



(46) 
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Thus, we obtain 

Un— m 

Gr m (fm) « fee + "l-Tj-ri 1 ~ r mT 5 . (47) 
Applying G on both sides of Eq. fj4"T|) and using Taylor expansion, we obtain 

r n = G (U + G' (UC + ^^C 2 -, (48) 

where ( = a(k n ~ m / (k))~ s (1 — r m )~ 5 . If P(k = 1) ^ 0, as discussed in Eq. (l4"B and Appendix 
B, /oo is non-zero, Gq(/oo) = (k)Gi(f 00 ) = (k)/^ is also non-zero, thus we can ignore the 
( 2 term and keep the leading non-zero term £. If P(k = 1) = and P(k = 2) ^ 0, both 
Giifoo) and /oo are zero, G„ (/„<,) = (^^(0) = 2P(A; = 2) ^ and then C 2 is the 

leading non-zero term. Thus, 



t,S(n—m) 

a/oo^pn(l - r m )~ 5 « (1 - 7m)-"- 1 , P(* = 1) ^ (49) 



7-n - T-oo ~ P(2)a 



~k n ~ m {\ - r m ) 
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;i - r m )-»-\ P(k = 1) = 0, P(* = 2) ^ (50) 



(k) 

Since P^ increases exponentially with £ for £ < d and decreases even faster than expo- 
nentially for £ > d [15I ]. we can make approximations r n ~ B n /N and 1 — r m ~ B m /N for 
n > d and m < d respectively. Using P(B n )dB n = P(B m )dB m and Eqs. (j32j) , (j49j) and 
( |50l . we obtain 

P(P rt ) ~ S-1-Ai/(A«+1)-1/0H-1) = jB -2 ; ( 51 ) 

which is valid for n ^> d. 

The power-law distribution shown by Eq. fl5T|) indicates that fractal features exist at the 
boundaries of almost all networks. Further studies of these fractal features are represented 
in Ref . 



f. Q. 



B. Average number of nodes in shell £, (Bp) 

The number of nodes in shell £ can be expressed as a function of re as 

B t = N{r e -r l+1 ). (52) 



From Eq. (l34j) and Eq. (l52l) . with initial condition r = r m , one can calculate Bi for all £ > m 
and find (Bp) for £ > m using P(P^). 
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However, when we study a finite network, the effect of the first few shells needs to be 
considered. Take a RR network as an example. From simulation data it is clear that Bo = 1, 
£>i = ip and B 2 = ip(ip — 1), and correspondingly r\ = 1 — 1/N, r 2 = 1 — (if> + 1)/N and 
r 3 = 1 - (1 + if; + tfj(ip - 1))/N = 1 - (ip 2 + l)/N. If we apply Eq.flJEJ on n and r 2 , 
the calculated r 2 = 1 — (if) — 1)/N and r 3 = 1 — — 1)/N deviate from the simulated 
results of r 2 and r 3 by a constant value of 2/iV. For A" — > oo and large this deviation is 
negligible. However for a finite system and small £, we have to consider this term. To cancel 
this constant deviation, we modify Eq. (l35l) as 

re +1 = rf- 1 -2/N. (53) 

Using Eq. (|53|) . starting from r±, we can calculate rt and Bi for any I > 1. For RR 
network, due to the homogeneity of the degree, the distribution of Bi is a delta function, 
thus (Bi) = Bi. In Fig. [6j we show the theoretical predictions of (5^) (full lines) together 
with the simulation results (symbols) for different values of if). The simulation results are the 
average over different realizations. The agreement between the theory and the simulation 
results is excellent. 

For networks with varying degree (like ER and SF), (Bi) cannot be directly calculated 
from our theory. The reason is that for these networks, the modification needed on Eq. f)34p 
is not a constant but fluctuates with a magnitude of the order of 1/N. Further, because 
(4>{ r i)) 0(( r ^)); we cannot replace B e with (Bi) as we did for RR. As we see in Fig. HI 
Eq. (l3^I) works well also for varying degree networks in predicting r^ +1 once 7> is known. It 
also works well in predicting -B^+a (A = 1, 2, 3...) given a shell with big enough Bi (^ 10 4 ). 
It can reproduce the behavior of successive shells with 99% accuracy. However, when -E^+a 
become small (< 10 4 ), the error is relatively large. 



VI. THE NETWORK CORRELATION FUNCTION c(r) 

In this section we will compare various models and real-world networks with the randomly 
connected networks with same degree distributions and introduce a new network character- 
istic, the network correlation function c(r) analogous to the density correlation function in 



statistical mechanics 



29j. For a randomly connected network, c(r) = 1, as for the density 



correlation function in the ideal gas, while for the non-random networks the deviation of 
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c(r) from unity characterizes their correlations on different distances from the root. 



A. SF networks with A < 3 

Our theory crucially depends on the existence of the branching factor k. So we can expect 
significant deviations from our theory in the behavior of the SF networks with A < 3, for 
which k diverges for N — > oo. However, for a fixed N, the degree distribution is truncated 
by the natural cutoff fc max ~ iV 1 ^ -1 ), so that k still exists. Hence, we hypothesize that 
our theory remains valid even for A < 3 for randomly connected networks (with PCL) (see 
Fig. [7]). Another problem is that our algorithm of constructing randomly connected networks 
leads to formation of PCL. The PCL is typically forbidden in the construction algorithms of 
the network characterizing complex systems. In order to construct a network without PCL, 
one imposes significant correlations in network structure of a dissortative nature with greater 
probability of hubs to be connected to small degree nodes than in a randomly connected 
network [26|. Thus, we can predict that SF networks with A < 3 which do not include PCL 
must significantly deviate from the prediction of our theory. 

In order to characterize this deviation we define a correlation function 

c{n) = r i+1 /(j)(n), (54) 

where r^ +1 and characterize two successive shells of a network under investigation while 
4>(ri) is the prediction (Eq. (l34l ) of r e+ i based on our theory for a randomly connected 
network. Accordingly, we compute c(r^) for several networks with N = 10 6 nodes with 
A = 2.5 and A = 2.2, for the randomly connected case and for the case in which PCL are not 
allowed. We find in Fig. [7J that for randomly connected networks c(r^) is always close to 1 
with the expected random deviations for ra — ► and ri — > 1 caused by random fluctuations in 
the small first (r^ — > 1) and last (r^ — > 0) shells. In contrast, c(r^) is uniformly smaller than 
1 for the networks without PCL. For A = 2.5 the deviations are small because the typical 
number of PCL that would randomly form still constitute a negligible fraction of links. For 
A = 2.2 the deviations are significant because in this case the chance of formation of PCL 
is much higher. In both cases, the deviation are increasing with the maximal degree of the 
network, which can randomly fluctuate around its jVW-i)r[(A-2)/(A-l)] 



28j |. The value of c < 1 for these networks indicates the fact that due to the absence of 
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PCL more nodes are attached to the next shells compared to randomly connected networks. 
Accordingly, for such networks, the fraction of nodes not included into shell i + 1 is smaller 
than that in randomly connected networks. Thus, SF networks for A < 3 are dissortative, 
which means that the degree of a node is anti-correlated with the average degree of its 
neighbors. Moreover, these anti-correlations are barely visible for A > 2.5 and increase with 
the decrease of A. Therefore c(r^) < 1 can be associated with the network dissortativeness. 

B. Global measurement of correlations 

The network building process described in Sec. II corresponds to a randomly connected 
network for a given P(k). However, real- world and model networks do not always follow the 
behavior described by our theory. The correlation function c(x) constructed in the previous 
section [Eq. (134j) ] can be used to detect non-randomness in the network connections. 

For a given degree distribution, we define poorly-connected networks as those in which 
c(ri) > 1. Conversely, we define well-connected networks as those in which c(rg) < 1. 
The motivation for this definition is that if c{r^) > 1, it means that the number of nodes 
in shell £, Bi = N(ri — re+i) = N[re — c(ri)(f)(ri)\, is smaller than N[rg — <p(ri)], the value 
expected for a randomly connected network with the same degree distribution. Therefore in a 
poorly-connected network information or virus spreads slower than in a randomly connected 
network in accordance with the meaning of the term poorly-connected. Conversely, in well- 
connected networks information spreads faster than in randomly connected network with 
the same degree distribution. Poorly-connected networks usually contain cliques of fully 
connected nodes. In a clique, the majority of links connect back to the already connected 
nodes in shell t. So the new shell I + 1 grows slower than for a randomly connected network 
with the same degree distribution. 

As an example, we analyze the WS model characterized by high clustering. In this case 
the number of links which can be used to build the next shell of neighbors is much smaller 
than in a randomly connected network with the same degree distribution. Thus we can 
expect c > 1 in particular for a small fraction (3 of rewired links (see Fig. [H^i). Further, 
we find that the networks characterizing human collaborations are usually poorly connected 
(see Fig. [8b). A typical example of such a network is the actor network, where a link 
between two actors indicates that they play in the same movie at least once. So all the 
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actors played in the same movie form a fully connected subset of the network ( "clique" ) . As 
a result, the majority of their links are not used to attract new actors but circle back to the 
previously acquainted actors. The same is correct for the Supreme Court Citation network 
(SCC) and High Energy Physics citations (HEP) networks in Fig. [Sb. Actor, HEP, and 
SCC networks all contain a large amount of highly inter-connected cliques. As we see these 
cliques manifests themselves in c > 1. In contrast the DIMES network [18], is designed to 
be well-connected and as a result it has c < 1. 

Another example of a well-connected network is the BA model, in which c{rg) linearly 
goes to zero for re — > [ Fig. [BJc)]. In the BA model a new node, which has exactly /c m j n 
open links, randomly attaches its links to the previously existing nodes with probabilities 
proportional to their current degrees. (PCL are forbidden.) One can see that for /c min > 2, 
c{ri) < 1 for all re except in a small vicinity of re — 1. This fact is associated with the 
dissortative nature of the BA model, in which small degree nodes that are created at the 
late stages of the network construction are connected with very high probability to the 
hubs that are created at the early stages. Thus as soon as the hubs are reached during shell 
construction, the rest of the nodes can be reached much faster than in a randomly connected 
network. 

The small region of c(r^) > 1 for re — > 1 can be associated with the fact that the hubs 
which are created at the early stages of the BA network construction, are not necessarily 
directly connected to each other as it would be in randomly connected networks. Thus 
initial shells of the BA model corresponding to large re grow slower than they would grow 
in the randomly connected network. The effect is especially strong for k m i n = 1 in which the 
BA network is a tree, and the distance between certain hubs can be quite large. Thus BA 
with k m i n = 1 gives an example of a network with poor connectivity between the hubs (large 
r£ — > 1) and good connectivity among the low degree nodes (r^ — > 0) which are directly 
connected to the hubs. In a network in which long connected chains of low degree nodes are 
abundant, we will have poor connectivity (c(r^) > 1) for re — ► 0. In general, the behavior of 
c(rg) for vi — > 1 characterizes the connectivity among the hubs, while the behavior of c{rg) 
for re — * characterizes the connectivity among the low-degree nodes. 
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VII. SUMMARY 



In this paper, we derive new analytical relations describing shell properties of a randomly 



connected network. In particular, we expand the results of Ref. [llj on the network tomog- 
raphy using the apparatus of the generating functions. We find how the degree distribution 
is depleted as we approach the boundaries of the network which consist of the r-fraction of 
nodes which are most distant from a root node. We find an explicit analytical expression 
for the degree distribution as a function of r [Eqs. (fTBl) and (iI7j) ]. We also derive an explicit 
analytical relation between the values of r for two successive shells I and I + 1 [Eq. fl34|) ]. 
Using this equation we construct a correlation function c(r) [Eq. (j54p ] of the network which 
characterizes the quality of the network connectedness. We apply this measure for several 
model and real networks. We find that human collaboration networks are usually poorly- 
connected compared to the random networks with the same degree distribution. The same is 
true for the WS small- world model. In contrast, we find that the Internet is a well-connected 
network. The same is true for the BA model. Thus our results indicate that the WS model 
and the BA model correctly reproduce an essential feature of the real-world models they 
were designed to mimic, namely, social networks and the Internet, respectively. Finally we 
apply Eq. (1341) to derive the power law distribution of the number of nodes in the shells 
with t » d |1J]. 
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APPENDIX A: DIFFERENTIAL EQUATIONS FOR A(r) AND A/(r) 

In this appendix, we derive the differential equations (Eq. (12"oT) ) for A(t) and Ai(t). At 
time t, the total number of open links in the r-exterior E r of the unconnected nodes is 
rN(k(r)). At step t, we connect one open link from the aggregate to another open link. 
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There is a probability 

r(t){k(r(t))) 



r(t){k(r(t))) + A(t) 
that will be connected to a free node. Thus, 

Nrit + 1)- Nr(t) - r M^M (Al) 
Nr{t + 1) - Nr{t) r(t)(Jfc(r(t))) + A(t) ■ ( A1 ) 

To derive the differential equations for A(£), we need to consider all three different scenarios 
which we illustrated in Fig. [TJ If we connect an open link from the aggregate to a node 
which is not yet connected to the aggregate (scenario (i) in Fig. [1]), on average A(t) will 
increase by k(r(t))/N. If we connect the open link from the aggregate to another open link 
either from shell I or shell £+1 (scenarios (iii) and (ii) in Fig. [JJ), A(t) will decrease by 1/N. 
Because we connect links at random, the probability of scenario (i) is 

r(t)(k(r(t))) 
r(t)(k(r(t)))+A(t) 

and the probability of scenarios (ii) or (iii) is 

m 

r(t)(k(r(t)))+A(ty 
Thus, we can write down the evolution of A(t) as 

Mt + D = Ad) - i + ~ k{r{t)) r MMM _ 1 m ( A2) 

y } u N N r{t)(k{r{t))) + A(t) N r(t)(k(r(t))) + A(t) ' 1 ; 



For N — > oo, Eqs. (IA1I) and fl A2[) lead respectively to 

dr(t) 1 r(t){k(r(t))} 



dt Nr(t)(k(r(t))) + A(t) 

and 



(A3) 



dA(t) _ 1 k(r(t)) r(t)(k(r(t))) 1 A(t) 

dt N N r(t)(k(r(t))) + A(t) N r(t)(k(r(t))) + A(t)' { ' 

Dividing Eq. (1A4l) by Eq. (1A3l) we obtain the differential equation for A as a function of r 

dA(r) ~ , , 2A(r) . , . 

Ae(t) behaves similarly to A(t) except that we only need to consider the effect of scenario 
(iii) of Fig. [TJ Accordingly, the evolution of can be written as 

Mt + 1) = Ht) - i - L —m— y ( A6) 
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which for N — > oo is 

dk.it) 1 1 A e {t) 



(A7) 



(A8) 



dt N N r{t) (k(r(t))) + A(t) ' 

Dividing Eq.(TA7]) by Eq.(TA3]), we get 

<iAi(r) = j A(r) A f (r) 
c?r r(k(r)) r(k(r)) 

APPENDIX B: SOLUTION OF Gi(/oo) = /oo AND 5 = - In Gi (/«,)/ In jfc 

The numerical solutions of Gi(/oo) = /oo can be shown by a simple example. Suppose we 
have three simple networks A, B and C. In network A, all the nodes can only have degree 
1, 2 and 3. In network B, the degree can be 2, 3 and 4. In network C, the degree can be 3, 
4 and 5. For all three examples, the probability of each degree is 1/3. We can write Go and 
Gi for three network as 







— fj* 1 — r f' C ^' 1 _L y>'^ 


(Bl) 


Go,s( 




-|- ~\- ~X^ 


(B2) 






1^3 1 J_™4 _|_ 

— 3 3 3 


(B3) 



The average degrees (k) = G' of A, B and C are 2, 3 and 4 respectively. Using the above 
expressions for Go we can construct the expressions for G\(x): 

Gx,a(x) =\ + \x+\x 2 (B4) 
Gi jB (x) =|x + |x 2 + |x 3 (B5) 
Gi,c(x) = \x 2 + fx 3 + ^x 4 (B6) 

The branching factors k = G^l) of A, B and C are 2/3, 20/9 and 19/6 respectively. The 
numerical solutions of Gi(/ 00 ) = /oo for network A and B is shown in Figs. [9^ and[9}o, where 
we plot the functions y — f and y = G\{f) on the same plot. From Fig. [9J we can see 
that there is a non-zero /oo = 1/3 for network A and /oo = for network B. For network 
C, we also have /oo = 0. Whether we can have a non-zero /oo depends on the first term 
of Gi(x), which depends on P(k = 1), the probability of having nodes with degree 1. If 
P(k = 1) 7^ 0, we can have /oo ^ 0, if P(k = 1) = 0, /oo = 0. Using Eq. (l4"0~]) . we can calculate 
5 A = ln(3/2)/ln(4/3) « 1.41, 5 B = ln(9/2)/ ln(20/9) « 1.88 and S c = oo. It is clear that 
network A and B have finite 5, while for network C, G'-^O) = thus 5 C = oo. In order to 
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have finite 5, P(k = 2) + Pik = 1) must be greater than 0. If P(k = 2) = P(k = 1) = 
(called the Bottcher case 20j), then S = oo, which indicates that $(?/) has an exponential 
singularity. For the Bottcher case, the distribution of B e is not described by a power law, 
i.e. there are no fractal boundaries. 
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FIG. 1: (color online) Building the network begins from a randomly chosen node (root), shown in 
red at the center of the figure. This schematic illustration shows the network during the building 
of shell I + 1. We do not start to build shell £ + 1 until shell £ is completed. All the nodes which 
are already included in shell £+1 are shown in blue, while the free nodes not yet connected in shell 
£ + 1 are shown in purple. At a certain time step, in order to connect an open link from shell £ 
to another open link, we must consider three scenarios: (i) Connecting to an open link taken from 
a free node, (ii) Connecting to an open link from shell I + 1. (iii) Connecting to another open 
link from shell I. This way the aggregate keeps growing shell after shell until all the open links 
are connected. Note that in scenarios (ii) and (iii) there is a chance to create parallel links (two 
links connecting a pair of nodes) and circular links (one link with two ends connected to the same 
node). For a large network with a finite k, such events occur with a negligible probability. 
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FIG. 2: The degree distribution, P r (k), in E r for (a) an ER network with N = 10 6 , (A;) = 6 
and r = 1,0.5 and 0.05. The simulation results (symbols) agree very well with the theoretical 
predictions (lines) of Eq. (|22p , (b) a SF network with A = 3.5, fc m ; n = 2 and N = 10 6 , P r (k) with 
r = 1, 0.5 and 0.1. The simulation results shown by symbols fit well with the theoretical predictions 
of Eq. (|16p . For a SF network, we compute Eq. (|16p numerically using the P(k) obtained from the 
generated network. 




FIG. 3: Average degree (fc(r)) of the nodes in E r as a function of r for (a) four ER networks 
with different values of (k), and (b) four SF networks with k m { n = 2 and different values of A. 
The symbols represent the simulation results for ER and SF networks of size N = 10 6 . The lines 
in (a) represent Eq. (|21|) . The lines in (b) are the numerical results of Eq, (|17p . using the degree 
distribution obtained from the networks. 
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r l r l 

FIG. 4: (Color online) For a randomly chosen root in the network, the fraction of nodes r&fi in 
Ef+i as a function of the the fraction of nodes in E(_ for (a) three RR networks of size N = 10 5 
with different ip. The red lines represents the theoretical prediction of Eq. (l35p , (b) Four ER 
networks of size iV = 10 5 with different (k). The red lines represent the theoretical predictions of 
Eq. (|36p . (c) Five SF networks of size N = 10 5 with different values of A. The red lines shown are 
the numerical results of Eq, (|34p using the degree distribution obtained from the simulation. For 
A > 2.5, the agreement between the theory [Eq. (|34p ] and the simulation results is perfect, (d) A 
SF network of size N = 10 5 with A = 2.2, which allows parallel and circular links (PCL) during 
its construction. Simulation results of SF networks with PCL show excellent agreement with the 
theory (full line). 
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FIG. 5: Cumulative distribution function of P{Bg), of the number of nodes Bi in shell t (I S> d) 
for an ER network with (k) = 4, N = 10 6 and d « 10.0, a SF network with A = 2.5, N = 10 6 and 
d « 4.7, the HEP network (d w 4.2) and the DIMES network (d w 3.3). Note that slope -1 of the 
cumulative distribution function implies P{Bg) ~ -B^T 2 , which holds for all four examples, as well 
as for many other networks studied fl4]. 




FIG. 6: The average number of nodes, (Bjg), in shell t as a function of the shell index £ for the 
RR network with different tp. The theoretical predictions (full lines) calculated from Eq. (|52p and 
Eq. (|53|) fit very well the simulation results (symbols). 
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FIG. 7: c(r^) for SF networks, (a) The case A = 2.2. Values of fc max equal and larger than the 
natural cutoff (& max = fcmin^ 1 ^ ^ ~ 2 x 10 5 ) are compared for networks with and without 
parallel and circular links (PCL). Notice that, the discontinuity of the lines is due to the existence 
of the large degree nodes, (b) The case A = 2.5. Similarly, values of k ma _ x equal and larger than 
the natural cutoff (k max «2x 10 4 ) are compared for networks with and without PCL. 
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FIG. 8: c(ri) for various networks, (a) WS network with ip = 4 and (3 = 0.3 and 0.5. (b) Four real 
networks: Actor collaboration network (Actor), High Energy Physics citations network (HEP), AS 
Internet network (DIMES), and Supreme Court Citation network (SCC). In the insert we show 
the enlarged area of r > 0.9. (c) BA networks of size N = 10 6 with different k m \ n . 



31 




1, 2 and 3, and (b) Network B, with equal probability of having degree 2, 3 and 4. For network A, 
a non-zero solution foo can be seen. For network B, = is the solution of Eq. (i3~TT) . 
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